Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual features
نویسندگان
چکیده
It is very attractive but technically very challenging if the user can retrieve photos from a huge collection using high-level personal queries (e.q. ”Uncle Bill’s House”). This paper proposes a set of approaches to achieve the goal assuming only 30% of the photos are annotated by sparse and spontaneously spoken descriptions when the photos are taken. We fuse the visual features (visual words plus global visual concepts from Columbia 374 detector) with the sparse speech features and train latent topics for the photos in the collection using non-negative matrix factorization. The retrieved results are then enhanced by twolayer mutually reinforced random walk based on different types of features. In this way it becomes possible to retrieve photos without speech annotation or with sparse annotations regarding different categories of information (e.g. where and who) because of the fused visual/speech features and jointly trained latent topics. Very encouraging results were obtained in initial experiments.
منابع مشابه
Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors
It is very attractive for the user to retrieve photos from a huge collection using high-level personal queries (e.g. “uncle Bill’s house”), but technically very challenging. Previous works proposed a set of approaches toward the goal assuming only 30% of the photos are annotated by sparse spoken descriptions when the photos are taken. In this paper, to promote the interaction between different ...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملSemantic Image Annotation and Retrieval with IKen
In this paper we introduce IKen, a platform for image retrieval enhanced by semantic annotations. The paper discusses work in progress and reports the current state of IKen. This comprises the functionalities for annotating photos with an underlying ontology, search features based on these annotations and the development of the domain ontology used for annotation.
متن کاملMultimodal and Mobile Personal Image Retrieval: A User Study
Mobile phones have become multimedia devices. Therefore it is not uncommon to observe users capturing photos and videos on their mobile phones. As the amount of digital multimedia content expands, it becomes increasingly difficult to find specific images in the device. In this paper, we present our experience with MAMI, a mobile phone prototype that allows users to annotate and search for digit...
متن کامل